-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-818891 Remove Arrow and other unneeded dependencies #497
Conversation
aa55d3c
to
cebefe0
Compare
Codecov Report
@@ Coverage Diff @@
## master #497 +/- ##
==========================================
- Coverage 76.97% 74.85% -2.13%
==========================================
Files 79 77 -2
Lines 5273 4764 -509
Branches 460 419 -41
==========================================
- Hits 4059 3566 -493
+ Misses 1024 1017 -7
+ Partials 190 181 -9
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
67bbdc7
to
e263fbb
Compare
e5e6cce
to
86cc28a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for cleanup!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sfc-gh-lsembera ! This is huge! Please make sure to ask Andrey or Gloria to take a look
allocator); | ||
} | ||
|
||
private BufferAllocator createBufferAllocator() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the memory allocation logic is inside the Parquet library? What if we want to use a different allocator per channel/table for usage tracking and better estimation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the memory allocation logic is inside the Parquet library?
Yes.
What if we want to use a different allocator per channel/table for usage tracking and better estimation?
That's a good point, we might need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need it, we can get it from git history, it is not a complicated code.
The code coverage decreases by 2.5%, could you make sure to add the tests for Parquet? |
// Unused (previously Arrow with per column compression. | ||
// TWO(2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can remove this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
allocator); | ||
} | ||
|
||
private BufferAllocator createBufferAllocator() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the memory allocation logic is inside the Parquet library?
Yes.
What if we want to use a different allocator per channel/table for usage tracking and better estimation?
That's a good point, we might need it.
{"Arrow", Constants.BdecVersion.ONE}, | ||
{"Parquet_w/o_optimization", Constants.BdecVersion.THREE} | ||
}); | ||
new Object[][] {{"Parquet_w/o_optimization", Constants.BdecVersion.THREE}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove this parameter logic as a whole, since it is just testing parquet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
86cc28a
to
0534b00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Lukas!
int compressedChunkLength = compressionResult.getSecond(); | ||
Pair<byte[], Integer> paddedChunk = | ||
padChunk(chunkData, Constants.ENCRYPTION_ALGORITHM_BLOCK_SIZE_BYTES); | ||
byte[] compressedAndPaddedChunkData = paddedChunk.getFirst(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byte[] compressedAndPaddedChunkData = paddedChunk.getFirst(); | |
byte[] paddedChunkData = paddedChunk.getFirst(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
Pair<byte[], Integer> paddedChunk = | ||
padChunk(chunkData, Constants.ENCRYPTION_ALGORITHM_BLOCK_SIZE_BYTES); | ||
byte[] compressedAndPaddedChunkData = paddedChunk.getFirst(); | ||
int compressedChunkLength = paddedChunk.getSecond(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int compressedChunkLength = paddedChunk.getSecond(); | |
int paddedChunkLength = paddedChunk.getSecond(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
public enum BdecVersion { | ||
/** Uses Arrow to generate BDEC chunks. */ | ||
ONE(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider to comment it out for history, just to see what we used prev numbers for, like TWO(2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out
@@ -1,30 +0,0 @@ | |||
JSch 0.0.* was released under the GNU LGPL license. Later, we have switched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it Arrow related? how to know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR removes other dependencies, not just Arrow. This license file is removed because the jsch (an SSH client pulled in by hadoop-common) is being removed.
|
||
@RunWith(Parameterized.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove Parameterized
only in this test? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed from everywhere
@sfc-gh-tzhang I reviewed the codecov report and there aren't any new holes in coverage. The reason for lower coverage is that the some higher-covered Arrow code has been removed, which lowered the overall coverage. |
This PR removes Arrow and the some transitive dependencies of
hadoop-common
: Apache Curator (Zookeeper client),commons-logging
,jsch
(SSH client). The JAR size shrunk from ~49MB to ~41MB. Build runtime has been cut significantly.Testing: Dew tests passed